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^"*^ \ Execution-replay (ER) is well known in the literature but has been restricted to special system architectures 

for many years. Improved hardware resources and the maturity of virtual machine technology promise to 
q ' make ER useful for a broader range of development projects. 

This paper describes an approach to create a practical, generic ER infrastructure for desktop PC systems 
using virtual machine technology. In the created VM environment arbitrary application programs will run 
and be replayed unmodified, neither instrumentation nor recompilation are required. 
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m ' 1 Introduction 

O 

CZ2 ■ The concept of execution replay (ER) has been known in literature for many years | Hus02 1 . Its first 

step is to bring the system into a well-defined and reproducible initial state. During the following 
system execution all non-deterministic events (stimuli to the system) like interrupts, user input, mo- 
ments of scheduling etc. are recorded. This allows to re-run the program identically in particular for 
debugging purposes. 

As the replay re-executes each single instruction, there is the opportunity to analyse its behaviour 
in all details and single step through it. This is in stark contrast to conventional logging mechanisms 
(printf) that only record a small subset of the program's execution. One of the biggest advantages of 
ER is that all timing-dependencies are recorded during the execution phase. Therefore it is possible 
to debug time-critical, multi-threaded code in non-real-time. 

Despite the simple principle, implementations of ER pose substantial problems and until now 
were only available for specialized areas e.g. for message passing system |RBdK00|. By creating a 
deterministic, replay-able virtual machine environment, it will be possible to use execution replay 
for a broad range of development projects. By using a VM emulating a complete personal computer, 
the development of all software for this system can benefit from execution replay. 

The rest of this paper is organized as follows: options to implement ER in generic (personal) com- 
puter systems are discussed in Section 12 Concluding that virtual machines offer many advantages, 
a particular example, the User-Mode-Linux (UML) virtual machine is introduced in Section [3| Sec- 
tion |3] explains how the UML VM can be adapted to ER and gives some consideration to relevant 



In M. Ronsse, K. De Bosschere (eds), proceedings of the Fifth International Workshop on Automated Debugging (AADE- 
BUG 2003), September 2003, Ghent. COmputer Research Repository (http://www.acm.org/corr/), cs.SE/yymmnnn; whole 
proceedings: cs.SE/0309027. 
1 E-mail: o.oppitz@ieee.org 
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hardware support in current microprocessor architectures. The paper is concluded by an outlook on 
promising directions for further research. 

2 Environments for Execution Replay 

For implementing execution replay, a system or subsystem to be recorded must be chosen. This can be a 
complete computer, an operating system process or a module of a program. In all cases the boundary 
of the system has to be defined precisely because all data going into the system needs to be recorded 
in the execution phase. During the replay this data is used to stimulate the system identically. 

For practical reasons it is difficult to record all input that a personal computer receives: it would 
be necessary to attach hardware probes which leads to problems with timing inaccuracies, signal 
noise etc. Also parts of this system are not deterministic: for example the exact timing of harddisk 
accesses cannot be reproduced in a replay. 

Alternatively one may record the inputs to a subsystem like the CPU and the memory system. In 
1991 this was shown to work using specialized custom hardware in IBG91I . However, the approach 
does not seem applicable for modern PC systems, because the relevant signals can hardly be accessed 
at the main boards and — if possible at all — the hardware will be expensive. 

In contrast the software domain avoids many of these problems. A natural system to record is an 
OS process, because this is a unit of processing, that is (mostly) independent from other processes. 
All data going into the process like command line arguments and file input need to be recorded. But 
also interactions with the OS like signals and system calls need to be kept track of. Unfortunately this 
interface is quite complicated for real-world operating systems. Also for parallel programs, interac- 
tions with other user or system processes need to be recorded (including shared memory accesses). 
Major adaptations to the operating system would be necessary for this. Due to this it would be hard 
to guarantee a faithful replay. 

Another implementation option is to design an operating system with execution replay in mind 
like the Asterix kernel iTPSJ- This operating system records all input to the machine (e.g. at the 
driver level) and records scheduling events. However, it is tailored for embedded systems and not 
compatible with standard operating systems. 

The solution proposed in this paper is a hybrid between the conceptual simplicity of recording a 
complete machine and the beauty of the software world: a deterministic, replay-able virtual machine 
executing a standard operating system. 

3 Introducing the User-Mode-Linux VM 

A virtual machine has favourable properties for execution replay. It has a relatively simple structure, 
at least compared to an operating system. As it is pure software it does not suffer from problems of 
the physical domain. And last but not least it can be adapted to operate deterministically. 

As a basis for further research and to illustrate the principle, the open-source User-Mode-Linux 
VM | Dik | by Jeff Dike was chosen. Due to its special virtualization approach, UML may only exe- 
cute Linux programs. However, there are other virtual machines like [VMwJ emulating a complete 
personal computer at the register-transfer level. These VMs can boot native operating systems like 
Linux or Windows from the original installation CDs and prove that this approach to ER is generally 
applicable. 

The virtualization scheme used by UML is to port the Linux operating system to a virtual UML 
architecture. Inside this virtual machine there is a Linux guest kernel that executes regular, unmodi- 
fied Linux binaries. When such a binary executes a privileged instruction like I/O or a system call, 
this access is detected by the host kernel and redirected to special handler routines in the UML bi- 
nary. Also interrupts and exceptions generated in the UML are trapped and executed in user mode. 
Therefore all UML code executes in user mode (hence the name) and no modifications to the host 
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kernel are necessary. The resources of the host operating system are only accessed through (virtual) 
hardware drivers. For example virtual harddisks are mapped to files of the host system, the keyboard 
is mapped to the host keyboard etc. Also network devices are supported so that a UML can access 
local and remote networks. 

4 Creating a Deterministic VM 

Currently UML does not support execution replay. Therefore it is the goal of the author to enhance 
it to deterministically execute, replay and debug arbitrary software. The key to this is to record all 
stimuli and their respective timing. 

Input stimuli to the VM are either delivered via interrupts or via virtual device drivers. Interrupts 
for the VM are implemented through signals created by the host OS. Like interrupts, signals are 
asynchronous to the program flow of the guest processes. Therefore during the execution phase the 
moments when they occur need to be recorded. The simplest and most accurate measure for this is 
the number of instructions executed until the moment when an interrupt occurs. 

In principle executed instructions can be counted in software by instrumenting branch instruc- 
tions and calculating the number of executed instructions from the current program counter (PC). 
However, this requires modifications to all executable code (kernel and applications) which is an ob- 
stacle for regular usage. Hardware instruction counters solve this problem by counting the executed 
(retired) instructions transparently in the background. 2 

A number of CPU implementations including Intel Pentium III, Pentium IV, AMD Athlon, Ita- 
nium, and PowerPC comprise configurable counters for counting retired instructions. Surprisingly 
these counters are far from accurate. For the x86 architectures the processor documentation states a 
number of cases where they count incorrectly In theory this can be compensated for, but tests by the 
author established that there more inaccuracies that render the counters unusable. For the Itanium 
no tests could be performed but the processor documentation does not give any guarantees about 
correct counter values either. 

Besides the x86 implementations the Motorola PowerPC MPC7441 was evaluated. Tests with an 
Apple eMac confirmed its counters to be accurate. A minor exception are interrupts which make 
the counter overcount: at each switch to and from the interrupt handler, the instruction counter is 
incremented erroneously by one. As this behaviour is deterministic, it can be compensated for by 
subtracting the number of switches to/from supervisor mode (this event can also be counted in 
hardware). 

Interesting features of the MPC are to configure the counters to only increment in user and /or 
supervisor mode and only count instructions executed by a specially marked process. This way it 
is possible to only count instructions executed by the UML virtual machine. When the VM accesses 
services of the host OS, these requests are performed in kernel mode (after a system call) and the 
instructions are not counted. This is essential because the host operating system gives no guarantees 
that it will execute the same number of instructions during the replay. Similarly any paging operation 
performed by the host OS are performed in kernel mode and thus transparent to the instruction 
counter of the UML. 

During the replay the so-called performance monitor interrupt is used. It creates (physical) inter- 
rupts after a pre-recorded number of executed instructions. By diverting these interrupts to special 
replay handlers in the UML, its interrupts and signals can be replayed. 

Besides recording the moment of interrupts, also input from external devices needs to be stored 
and replayed. For UML this done is in a straightforward manner by extending the (virtual) device 
drivers. The attractivity of using a VM for execution replay is mostly due to the fact, that this driver 
interface is small and well-defined. 



2 In ITHOO ii Thane and Hansson introduce another software approach measuring elapsed physical time with a defined resolu- 
tion and inserting breakpoint instructions into the object code. This is much simpler than instrumenting branches, but there 
are (rare) cases where the approach fails to generate a correct replay. 
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The described setup allows to record and reproduce the virtual machine's behaviour, including 
its applications. In order to debug an application a modified debugger is needed, that attaches to a 
process within the UML VM. Normally a debugger is supported by the operating system to attach 
to a process. In this case the guest OS cannot possibly be aware of the debugger, as it merely replays 
a recorded instruction sequence. Thus the debugger (executing in the host OS) must be aware of 
the structure and memory layout of the guest OS in order to read the stacks, register contents, set 
breakpoints and single step through the code etc. It is intended to adapt gdb or a kernel debugger 
(kgdb) for this purpose. 



5 Outlook 

To be generically applicable, the overhead in the execution phase must be small enough to execute the 
program with sufficient speed. Many application programs — in particular GUI driven software — 
have very moderate performance requirements and run satisfactorily on low-end machines. There- 
fore the author assumes that a high-end machine will have enough (processor) resources to record 
the input stimuli in the background without slowing down the program too much. If this assump- 
tion holds, the chosen approach will allow to apply the ER concept to nearly all areas of software 
development — without modifications to the program or system. 

As this paper illustrates, the base technology for making ER a reality is available today: a generic 
virtual machine environment and software or hardware instruction counters. It is the aim of the 
author to combine these technologies for further research in this area and to prove its viability. 

In the long run, improved hardware instruction counters may allow simplified implementations 
in more architectures. Also special hardware extensions dedicated solely to execution replay are con- 
ceivable. These may include support for virtual machines (as known from mainframe systems like 
the IBM S390) or automatically record data read from peripheral registers or save time stamps for 
interrupts. By careful design, such hardware extensions might even perform all recording stealthily 
in the background and thus eliminate the so-called probe effect. This would allow to apply ER to all 
software executing on such a machine. 
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